WHAT IS CLAIMED IS: 



1. A speech processing apparatus comprising y 

generation means for generating a pseudo 
acoustic echo signal based on a current impulse 
response simulating an acoustic echo transfer path 
and on a source signal; / 

supply means for holding the current impulse 
response and supplying the current impulse 
response to said generation meafns; 

elimination means for subtracting said pseudo 
acoustic echo signal from ar microphone input 
signal to remove an acousxic echo component and 
thereby generate an acoustic echo-canceled signal; 

update means foir continually updating the 
impulse response bousing said source signal, said 
acoustic echo-canceled signal and the current 
impulse responses held by said supply means and for 
supplying the /updated impulse response to said 
supply means'; 

decision means for checking, in each frame, 
whether px not a voice is included in the 
micropbfone input signal, by using time domain 
information and frequency domain information of 
saiti acoustic echo-canceled signal; 

/ storage means for storing one or more impulse 



responses; and 

control means for, in a frame for>wfiich the 
esult of decision made by said det5ision means is 
negative, storing in said sJtcJrage means the 
current impulse respopere held by said supply means 
and, in a frame .£<Jr which the result of decision 
is positiv^retrieving one of the impulse 
respoi>efes stored in said storage means and 
sjafJplying it to said supply means . 

2. A speech processing apparatus as claimed in 
claim 1, wherein said acoustic echo-canceled 
signal is used for speech recognition. 

3. A speech processing apparatus as claimed in 
claim 2, further comprising: 

means for determining a spectrum for each 
frame by performing the Fourier transform on said 
acoustic echo-canceled signal; 

means for successively determining a spectrum 
mean for each frame based on the spectrum 
ob t ained ; and 

means for successively subtracting the 
spectrum mean from the spectrum calculated for 
each frame from said acoustic echo-canceled signal 
to remove additive noise of an unknown source. 
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4. A speech processing apparatus as claimed in 
claim 2, further comprising: 

means for determining a spectrum for each 
frame by performing the Fourier transform on said 
acoustic echo -canceled signal; 

means for successively determining a spectrum 
mean for each frame based on the spectrum 
obtained; 

means for successively subtracting the 
spectrum mean from the spectrum calculated for 
each frame from said acoustic echo-canceled 
signal; 

means for determining a cepstrum from the 
spectrum, the spectrum being removed of the 
additive noise of an unknown source by said 
subtraction means; 

means for determining for each talker a 
cepstrum mean of a speech frame and a cepstrum 
mean of a non- speech frame, separately, from the 
cepstrums obtained; and 

means for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 
subtracting the cepstrum mean of the non- speech 
frame of each talker from the cepstrum of the non- 



speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 
characteristics from the mouth of the talker to 
5 the microphone . 

5. A speech processing apparatus as claimed in 
claim 2, further comprising: 

means for determining a spectrum for each 

10 frame by performing the Fourier transform on said 
acoustic echo-canceled signal; 

means for determining a cepstrum from the 
spectrum obtained; means for determining for each 
talker a cepstrum mean of a speech frame and a 

15 cepstrum mean of a non-speech frame, separately, 
from the cepstrums obtained; and 

means for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 

20 subtracting the cepstrum mean of the non- speech 

frame of each talker from the cepstrum of the non- 
speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 

25 characteristics from the mouth of the talker to 
the microphone. 
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6. 



A speech processing apparatus comprising: 



means for determining a spectrum for each 
frame by the Fourier transform; 

means for determining a cepstrum from the 
spectrum obtained ; 

means for determining for each talker a 
cepstrum mean of a speech frame and a cepstrum 
mean of a non- speech frame, separately, from the 
cepstrums obtained; and 

means for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 
subtracting the cepstrum mean of the non- speech 
frame of each talker from the cepstrum of the non- 
speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 
characteristics from the mouth of the talker to 
the microphone. 

7. A speech processing method comprising: 

a generation step for generating a pseudo 
acoustic echo signal based on a current impulse 
response simulating an acoustic echo transfer path 
and on a source signal; 
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a supply step for holding the current impulse 
response and supplying the current impulse 
response to said generation step; 

an elimination step for subtracting said 
5 pseudo acoustic echo signal from a microphone 

input signal to remove an acoustic echo component 
and thereby generate an acoustic echo-canceled 
signal; 1 

an update step for continually updating the 
10 impulse response by using said source signal, said 
acoustic echo-canceled signal and the current 
impulse response held by the supply step and for 



supplying th 
supply step; 
15 a decis 



b updated impulse response to said 



on step for checking, in each frame, 
whether or ndt a voice is included in the 
microphone input signal, by using time domain 
information akd frequency domain information of 
said acoustic lecho-canceled signal; 
20 a storage! step for storing one or more 

impulse responses; and 

a control step for, in a frame for which the 

\ 

result of decision made by said decision step is 
negative, storirlg in said storage step the current 
25 impulse response! held by the supply means and, in 
a frame for which the result of decision is 
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positive, retrieving one of the impulse responses 
stored \n said storage step and supplying it to 
said supply step. 

5 8. A speech processing method as claimed in 
claim 7, wherein said acoustic echo-canceled 
signal is used for speech recognition. 

9. A speech processing method as claimed in 
o claim 8, further comprising: 

a step for determining a spectrum for each 
frame by performing the Fourier transform on said 
acoustic echo-canceled signal; 

a step for successively determining a 
5 spectrum mean for each frame based on the spectrum 
obtained; and a step for successively subtracting 
the spectrum mean from the spectrum calculated for 
each frame from said acoustic echo-canceled signal 
to remove additive noise of an unknown source. 

0 

10. A speech processing method as claimed in 
claim 8, further comprising: 

a step for determining a spectrum for each 
frame by performing the Fourier transform on said 
5 acoustic echo- canceled signal; 

a step for successively determining a 
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spectrum mean for each frame based on the spectrum 
obtained; 

a step for successively subtracting the 
spectrum mean from the spectrum calculated for 
each frame from said acoustic echo-canceled signal 
to remove additive noise of an unknown source; 

a step for determining a cepstrum from the 
spectrum removed of the additive noise; 

a step for determining for each talker a 
cepstrum mean of a speech frame and a cepstrum 
mean of a non- speech frame, separately, from the 
cepstrums obtained; and 

a step for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 
subtracting the cepstrum mean of the non- speech 
frame of each talker from the cepstrum of the non- 
speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 
characteristics from the mouth of the talker to 
the microphone. 

11. A speech processing method as claimed in 
claim 8, further comprising: 

a step for determining a spectrum for each 
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frame by performing the Fourier transform on said 
acoustic echo -canceled signal; 

a step for determining a cepstrum from the 
spectrum obtained; a step for determining for each 
talker a cepstrum mean of a speech frame and a 
cepstrum mean of a non- speech frame, separately, 
from the cepstrums obtained; and 

a step for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 
subtracting the cepstrum mean of the non-speech 
frame of each talker from the cepstrum of the non- 
speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 
characteristics from the mouth of the talker to 
the microphone . 

12. A speech processing method comprising: 

a step for determining a spectrum for each 

frame by the Fourier transform; 

a step for determining a cepstrum from the 

spectrum obtained; 

a step for determining for each talker a 

cepstrum mean of a speech frame and a cepstrum 

mean of a non- speech frame, separately, from the 



cepstrums obtained; and 

a step for subtracting the cepstrum mean of 
the speech frame of each talker from the cepstrum 
of the speech frame of the talker and for 
subtracting the cepstrum mean of the non- speech 
frame of each talker from the cepstrum of the non- 
speech frame of the talker to correct 
multiplicative distortions that are dependent on 
microphone characteristics and spatial transfer 
characteristics from the mouth of the talker to 
the microphone . 



